Fast algorithms for hiding sensitive high-utility itemsets in privacy-preserving utility mining

نویسندگان

  • Chun-Wei Lin
  • Tsu-Yang Wu
  • Philippe Fournier-Viger
  • Guo Lin
  • Justin Zhijun Zhan
  • Miroslav Voznak
چکیده

High-Utility Itemset Mining (HUIM) is an extension of frequent itemset mining, which discovers itemsets yielding a high profit in transaction databases (HUIs). In recent years, a major issue that has arisen is that data publicly published or shared by organizations may lead to privacy threats since sensitive or confidential informationmay be uncovered by data mining techniques. To address this issue, techniques for privacy-preserving data mining (PPDM) have been proposed. Recently, privacy-preserving utility mining (PPUM) has become an important topic in PPDM. PPUM is the process of hiding sensitive HUIs (SHUIs) appearing in a database, such that the resulting sanitized database will not reveal these itemsets. In the past, the HHUIF and MSICF algorithms were proposed to hide SHUIs, and are the state-of-the-art approaches for PPUM. In this paper, two novel algorithms, namely Maximum Sensitive Utility-MAximum item Utility (MSU-MAU) and Maximum Sensitive Utility-MInimum item Utility (MSU-MIU), are respectively proposed to minimize the side effects of the sanitization process for hiding SHUIs. The proposed algorithms are designed to efficiently delete SHUIs or decrease their utilities using the concepts of maximum and minimum utility. A projection mechanism is also adopted in the two designed algorithms to speed up the sanitization process. Besides, since the evaluation criteria proposed for PPDM are insufficient and inappropriate for evaluating the sanitization performed by PPUM algorithms, this paper introduces three similarity measures to respectively assess the database structure, database utility and item utility of a sanitized database. These criteria are proposed as a new evaluation standard for PPUM. & 2016 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

An Efficient Method for Protecting High Utility Itemsets in Utility Mining

Privacy preserving data mining (PPDM) has become a popular research direction in data mining. Privacy preserving data mining is an approach to develop algorithms by which we can modify the utility values of original data using some techniques in order to protect sensitive information from unauthorized user. Protecting data against illegal access becomes a serious issue when this data is require...

متن کامل

Reducing Side Effects of Hiding Sensitive Itemsets in Privacy Preserving Data Mining

Data mining is traditionally adopted to retrieve and analyze knowledge from large amounts of data. Private or confidential data may be sanitized or suppressed before it is shared or published in public. Privacy preserving data mining (PPDM) has thus become an important issue in recent years. The most general way of PPDM is to sanitize the database to hide the sensitive information. In this pape...

متن کامل

A GA-Based Approach to Hide Sensitive High Utility Itemsets

A GA-based privacy preserving utility mining method is proposed to find appropriate transactions to be inserted into the database for hiding sensitive high utility itemsets. It maintains the low information loss while providing information to the data demanders and protects the high-risk information in the database. A flexible evaluation function with three factors is designed in the proposed a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Eng. Appl. of AI

دوره 55  شماره 

صفحات  -

تاریخ انتشار 2016